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Quaero: Motivation, Summary, Status 

Bruce Knuteson 
Enrico Fermi Institute, University of Chicag^ 

Quaero is a web-based tool that automates high-pT analyses. It has been designed with the goals 
of expunging exclusion contours from conference talks, obviating the necessity of "uncorrecting" 
experimental results, reducing human bias in experimental measurements, reducing by orders of 
magnitude the time required to perform analyses, allowing the publication of collider data in their 
full dimensionality, rigorously propagating systematic errors, dramatically increasing the robustness 
of experimental results, and facilitating the combination of results among different experiments. 
Quaero has been used to make a subset of D0 Run I data publicly available, and is being explored 
as a means of putting LEP data at your fingertips. These proceedings review the motivation for 
Quaero, summarize the key enabling ideas, and provide a snapshot of the project's present status. 
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FIG. 1: A collage of exclusion plots shown in an hour's worth 
of a typical conference — in this case Topics in Hadron Col- 
lider Physics 2002, Thursday, Oct 10, from 4-5pm Q. 



sented. Starting with an enormous model space, such as 
the 105 parameters in the MSSM, ad hoc assumptions 
are imposed in order to restrict the number of free pa- 
rameters to two — this being the dimensionality of the 
sheet of paper on which the result will be published - 
and limits are placed on the two unfixed parameters. 



I. MOTIVATION 

Current practice for testing models against collider 
data can be significantly improved on many fronts. 



A. Exclusion plots 

Take as an example the way in which the results of 
searches beyond the standard model are typically pre- 



*URL: http://hep.uchicago.edu/~knuteson/ Electronic address: 
knuteson@fnal.gov 



Conference audiences are then inundated with the re- 
sulting exclusion plots. The collage shown in Fig. ^ rep- 
resents an hour's worth of a typical conference — in this 
case Topics in Hadron Collider Physics 2002, Thursday, 
Oct 10, from 4-5pm. 

Exclusion plots such as these are inherently confus- 
ing and basically useless. They are inherently confusing 
because it is essentially impossible to tell exactly what 
model is being tested, including all assumptions that are 
made; in many cases this is not even clear to the author. 
They are basically useless because it is nearly impossible 
to tell from the exclusion plot what the data have to say 
about some other model that happens to not lie in the 
two-dimensional parameter space shown . 
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B. Full dimensionality 

The results of analyses are also sometimes displayed 
by showing histograms of data and comparing with the 
predictions of several models. This is clearly better, but 
limited by the fact that the data are inherently multi- 
dimensional, while histograms published in journals are 
inherently one- (or perhaps two-) dimensional. Lots of 
information is lost in the projection. 

Consider for example the simplest conceivable final 
state at the Fermilab Tevatron, arising from the process 
pp — > Z/"f* — > e + e~. To first order, three quantities are 
sufficient to completely characterize each event: these 
can be taken to be the invariant mass of the two elec- 
trons (m ee ), the polar angle of the positron (cos#), and 
the transverse momentum of the e + e~ pair (pf T e ). No 
existing publication contains the three-dimensional in- 
formation needed to optimally test a hypothesis against 
even this simple data set. Indeed, even viewing just 
the three one-dimensional projections requires looking 
in three different publications. In the case of CDF, see 
Ref. for m ee , Ref. for cos#, and Ref. 3 forp^ +e . 

In the case of D0, see Ref. Q for m ee , Ref. Q for p^ e , 
and let me know if you find a D0 publication containing 
the distribution of cos#. 



C. Uncorrecting 

When publishing histograms, there is the further com- 
plication that the natural variables in which to display 
results are quantities measured by the experiment. Com- 
parison with the underlying theory, however, is facilitated 
if the results can be published in terms of the partons 
emerging from the hard scattering. As a result, a great 
deal of effort is often expended in so-called "uncorrect- 
ing" ("unfolding," "unsmearing," ...) procedures, which 
attempt to invert the function represented by the detec- 
tor simulation. This is a futile enterprise — the detector 
simulation is a function easily and naturally understood 
in the forward direction in terms of a Monte Carlo propa- 
gation of particles obeying well-known laws of scattering 
and energy deposition, but awkwardly inverted in all but 
the most trivial detectors, and any uncorrecting is gener- 
ally inapplicable beyond the immediate use for which it 
was painstakingly developed. The natural place to com- 
pare the results of theory with experiment is in terms of 
the quantities observed in the detector. 



D. Human bias 

Another issue deserving attention is how a set of cuts 
can possibly be chosen without bias. Figure [21 shows a 
typical scenario, in which background populates a region 
in the lower left in a space of two observables, and signal 
populates a region in the upper right. Simulated back- 



FIG. 2: How can a set of unbiased cuts be chosen? In a 
space of two observables, simulated background events (x) 
lie toward the lower left, simulated signal events (small balls) 
toward the upper right; events seen in the data are shown 
as large dots. Whether the dotted or dashed contour is used 
to separate signal from background is subject to subtle hu- 
man bias; the difference can be as simple and as seemingly 
innocuous as the number of nodes used in the hidden layer of 
a neural network. 



ground events are shown as x , simulated signal events 
are shown as small balls, and events observed in the data 
are shown as large balls. Depending upon how much one 
believes there is signal in these data, one could choose 
the dashed curve (believer) or the dotted curve (disbe- 
liever) to separate signal from background. The differ- 
ence between the two curves in this case is as simple and 
seemingly innocuous as the number of nodes used in the 
hidden layer of a neural network. 



E. Time 

The testing of hypotheses against data collected by 
large particle physics collaborations these days usually 
follows a rather elongated time line. An example on the 
quick side of average: 

Jan 1, 2002. Theorist wakes up with a hangover 
and a brilliant idea. 

Mar 15, 2002. Theorist runs into a long-time 
experimental colleague at XXXVII Rencontres de 
Moriond. The experimentalist, in a moment of 
weakness, decides his theoretical friend may be on 
to something. He returns to his home institution, 
and excites his graduate student about the idea. 

Jun 7, 2002. The graduate student finishes his 
classes, passes his exams, and heads off to the ex- 
periment. 
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Sep 1, 2002. The graduate student has mastered 
the experiment's analysis tools and offline frame- 
work, and plunges with gusto into the analysis. 

Jan 1, 2003. Theorist wakes up with a hangover. 

Jun 1, 2003. After overcoming various hurdles 
and writing ten thousand lines of code to imple- 
ment a particularly clever algorithm, the student 
has the analysis fairly well in hand, and has ob- 
tained a preliminary result. 

Dec 31, 2003. The student's advisor being a re- 
spected and active member of the collaboration, the 
collaboration review process has sped through at 
an unprecedented clip, converging in three months. 
The journal referees responded promptly and with 
few comments, allowing publication in the final is- 
sue of the year. 

Jan 1, 2004. Theorist wakes up, reads the article, 
and struggles to remember why this seemed like 
such a good idea. 

Most of us would really prefer something more closely 
resembling: 

11:52am. Physicist has an idea. 

11:56am. Physicist enters idea into his terminal. 

12:01pm. Physicist heads for lunch. 

12:47pm. Physicist receives email quantifying the 
extent to which the data favor (or disfavor) his idea. 

12:52pm. Physicist comes back from lunch to find 
results waiting for him. 

Reducing the time required to perform an analysis from 
two years to one hour corresponds to a speedup of over 
four orders of magnitude. 



F. Systematic errors 

A convenient scheme for assigning and propagating 
systematic errors in our analyses has also been lacking; 
the approach taken is in some cases laughably ad hoc. 
This leads to the quoting of inflated systematic errors 
(defended as "conservative"), resulting in a less sensi- 
tive test of the model under consideration. Gaussian er- 
rors are nearly always assumed for convenience of cal- 
culation; any two errors arc cither completely correlated 
or completely uncorrelated (see e.g. Ref. 0); propaga- 
tion through anything more complicated than addition 
or multiplication rarely causes the student to go to the 
trouble of differentiating the expression to see how the 
results should be propagated; if he does, he is likely to 
get it wrong. 



G. Robust results 

Frequently, completely different analyses are per- 
formed for the testing of different models, even when 
the same subset of data is utilized. More problematic 
than the inefficiency caused by this duplication of effort 
is the resulting difficulty of validation. Each graduate 
student writes his own code for the manipulation of the 
data and backgrounds — code that is used only for the 
purpose of one analysis, and therefore validated only to 
a limited extent through the obvious cross-checks that 
are performed in order to justify the results obtained. 
Ascertaining the correctness of an analysis down to the 
level of potential bugs in the code thereby requires a 
Herculean effort on the part of the reviewing committee, 
which rarely spends substantial time digging through the 
student's C++. The vast amounts of time typically spent 
optimizing a particular analysis generally decreases the 
robustness of the scientific conclusion, as bugs multiply 
most fervently in complex and clever code. 

H. Combining results 

We fall down also on the combination of results, both 
for results from different subsets of the data within a 
given experiment, and for results from different experi- 
ments. At the Tevatron in particular, there is a history 
of eschewing the painful process of combining results be- 
tween CDF and D0; the LEP experiments have been 
significantly more successful on this front. The combina- 
tion of experimental results is thus frequently performed 
by some theorist sitting in the back row of a conference, 
adding the quoted errors in quadrature (plus a little bit) 
and determining the combined answer. The existence of 
a well-defined and well-oiled mechanism for combining re- 
sults — not only between the two Tevatron experiments, 
but also simultaneously with the experiments at LEP and 
at HERA — would be welcome. 



I. Wish list 

My personal analysis wish list therefore looks some- 
thing like the following [l5j : 

• Expunge exclusion contours from conference talks 

• Obviate the necessity of "uncorrecting" 

• Reduce human bias 

• Reduce analysis time by a factor of 10 4 

• Publish data in their full dimensionality 

• Rigorously propagate systematic errors 

• Increase the robustness of results 

• Easily combine results among different experiments 
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Final State : j e mu (m et) (nj) jj Smear? U 

r Pythia Input : 

C Signal File : | ' Browse. . | xsec : 1 pb 

G View C Search 
Bnckaoun:!-.: ' ln]j a y-(ni) P Z(njl P W(njl P tt(ni) 

Constraints : | 



Variables: 



ml 




■m] 
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Requestor 


Name: | 
Institution: | 




Email: 

Brief description of model: 
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| Submit | 



Help! - Bug history - D0 - Fermilab - Author 



FIG. 3: Th e Quaero interface for D0 Run I 
( http : // quaero . f nal . gov/ ) . 



A physicist keen on a particular model begins by se- 
lecting the appropriate final state ( Final State ) in the in- 
terface in Fig. |31 The final states made available through 
Quaero at D0 are those with an electron and muon; 
with two electrons and two or more jets; and with an 
electron, missing transverse energy, and two or more jets. 

The physicist then provides the events predicted by the 
model — either in the form of commands to Pythia 
(Pythia Input), which Quaero will use to generate 
events, or in the form of a file containing the four-vectors 
of the events predicted (Signal File), together with the 
cross section of the process ( xsec ). The physicist has 
the option of viewing signal, standard model background, 
and data (View), or asking Quaero to perform an op- 
timized search for this particular signal ( Search ). Indi- 
vidual background processes can be left out of the back- 
ground estimate, if desired, by unchecking the appropri- 
ate box (Backgrounds). 

The physicist can provide explicit cuts ( Constraints ), 
and up to three variables ( Variables ) to distinguish 
signal from background. Variables are written in 
a simple syntax; examples include e_pt, jl_phi, 
met_pt, mass (el , e2) , transversemass (e ,met) , and 
(jl+j2)_pt. More complicated variables mixing 
four-vector quantities and standard C syntax, such 
as sqrt ( j l_phi+aplanarity () ) *exp (-f abs ( j2_eta) ) , 
can also be used. A complete description of possibilities 
is given in a manual available from the web page. 

After keying in his name, institution, the email to 
which the result should be sent, and a brief description 
of the model to be tested, the physicist hits the Submit 
button, and heads for lunch. 



• All of this on the web 



II. QUAERO: D0 RUN I 

A first solution to these desiderata has been achieved, 
in the form of an algorithm named Quaero (Latin for 
"I search for, I seek"). Using Quaero, D0 has made a 
subset of its Run I data publicly available for use by the 
scientific community @- This represents the first such 
attempt by a high energy collider collaboration. 



A. Interface 



B. Algorithm 

Quaero takes the events the physicist has provided 
(generating them, if given in the form of Pythia com- 
mands), runs them through a parameterized simulation 
of the D0 detector, and retains those that land in the de- 
sired final state, correctly accounting for the efficiency of 
object identification and the geometric acceptance of the 
detector. Density estimates p(v\S) and p(v\B) are ob- 
tained for the signal S and background B in the variable 
space V 3 v provided, and used to carve out a signal- 
rich region. Within this selected region, the number of 
events observed in the data are compared to the number 
of events expected from S and from B, and constraints 
on the cross section of the signal are determined. 



The first incarnation of Quaero, enabling searches 
for new phenomena at D0 and the calculation of their 
cross sections (or limits thereon), has been available at 
http://quaero.fnal.gov/ since the summer of 2001. 
Using the web interface shown in Fig. El an Y high energy 
physicist can test a model against a subset of D0 Run I 
data, obtaining results within an hour. 



C. Results 

The resulting constraints on the signal cross section, 
together with plots of the variables used by Quaero to 
determine these constraints, are returned to the physicist 
by email. The email returned from an actual Quaero 
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i Untitled Notepad 



Edit Format View Help 



|ro : quaerc®f na i . gov 
Subject: Quaero Request #29 

Fi nal State 
e met 2j (ni). 

constraints 

Chi SqdConstr ai n(e, met , W)<1Q 

Variables 

mass (e, met , jl, j 2} 

ModelDescription 

Right-handed w' s coupling to T-bbar. 



pythia commands 

msubCL42>l 
pmas(34,l>35u. 
paru(131)=l 
paru(13 2)=l 
paru(13 3)=l 
paru(134>l 

Result 



select f fbar -> w' 

set w 1 mass to 350 Gev 

vector coupling of q/qbar to W 

axial coupling of q/qbar to W 

vector couplinq of lepton/neutrino to W 

axial coupling of lepton/neutrino to W' 



pythia cross section x branching ratio = 1.68 pb. 

upper limits on the cross section to this process at confidence levels of 50 
, 90%, and 95% are found to be 0. S pb, 1.8 pb, and 2.1 pb, respectively. 
Maximal sensitivity (0.73 pbA-i) is achieved in a region of variable space with 
17.6 signal events expected, 32. 7 f- 7.1 background events expected, and 36 
events observed in the data. 

Plots 

Plots of the variables that you used are available for viewing at http:// 
quaero.fnal.gov/quaero/requests/plots/29.ps. The red curve is the expected 
background; the green curve is your signal multiplied by a factor of 10; the 
blacE dots are DO data. 



so r~i 



FIG. 4: An an example of an email returned with the result 
of a Quaero analysis. This was request #29, early in the 
life of Quaero; the current ticket number is over three hun- 
dred. The requesting physicist chose the final state contain- 
ing one electron, missing transverse energy, and two or more 
jets (e met 2 j (nj ) ) . The electron and inferred neutrino are 
constrained to a W boson (ChiSqdConstrain(e ,met ,W) <10), 
and the invariant mass of the electron, neutrino, and two 
leading jets (mass (e , met , j 1 ,j2)) is used for the purpose of 
distinguishing signal from background. The signal of inter- 
est is Wr — > tb — > evbb, provided to Quaero in the form of 
commands to the Pythia event generator in standard nota- 
tion. No evidence for new physics is observed in this case, so 
Quaero returns limits on the cross section of this process at 
various levels of confidence. Also provided are the number of 
signal and background events expected in the region selected 
by Quaero, the number of events observed in the data in 
that region, and a link to a plot of the variable used, shown 
inFig.El 



result is shown in Fig. 0] the plot of the variable used is 
shown in Fig. 03 



The data, backgrounds, and analysis procedure hav- 
ing been thoroughly reviewed by the D0 collaboration, 
the answer comes to the querying physicist as an official, 
D0-approved result. The result can be published in the 
querying physicist's own paper, sans D0 author list. A 
Physical Review Letter describing Quaero and its appli- 
cation to eleven thesis-level analyses, the results of which 
are provided in Table [IJ has been published @. Roughly 
half of these analyses can be directly compared with pre- 
vious CDF and D0 results; as expected, Quaero is 
found to be correct and competitive in all cases. 




200 400 600 800 
mass(e,met,j1 ,]2) 

FIG. 5: Distribution of background (dark histogram), signal 
(light histogram) multiplied by a factor of 10, and D0 Run I 
data (solid points). The variable shown is the invariant mass 
of the electron, inferred neutrino, and two leading jets, after 
constraining the electron and neutrino to a W boson. The 
signal peaks at the assumed mass of the Wr, in this case 
350 GeV. 



III. QUAERO: LEP RUN II 

If a hint of new physics is revealed in Tevatron Run 
II, it is almost guaranteed that we will be unable to de- 
termine from the Tevatron data alone the nature of that 
new physics. Unraveling the clues the Tevatron provides 
will require access to all high energy collider data at our 
disposal. The LEP II data, in particular, may in fact 
be more valuable in helping us to disentangle a Tevatron 
hint than the rest of the Tevatron data. If in two years a 
hint is observed at the Tevatron, hep-ph will be flooded 
with models attempting to explain the observation. In 
this event, having the ability to painlessly and quickly 
test many specific hypotheses against the LEP data will 
be invaluable. Although there may be little more to be 
learned from the LEP data now, there could be a great 
deal to be learned from the LEP data in two years' time, 
illuminated by the fresh light of a Tevatron surprise. 

This situation can be easily imagined. The LEP II 
data set is sufficiently large that there are certainly many 
3(7 discrepancies. If two models A and B attempt to 
explain a Tevatron hint, and Model A correctly predicts 
a 3cr excess in a subset of LEP data in which Model B 
predicts a 3a deficit, then the LEP data favor Model A 
relative to Model B by a factor of roughly a million to 
one. The data collected in LEP Run II therefore still 
have extraordinary ability to discriminate among models 
that may be proposed to explain hints of new physics 
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TABLE I: Limits on cross section x branching fraction for 
several thesis-level analyses performed using Quaero. All fi- 
nal states are inclusive in the number of additional jets. The 
fraction of the signal sample satisfying Quaero's selection cri- 
teria is denoted e s i g ; b is the number of expected background 
events satisfying these criteria; and TVdata is the number of 
events in the data satisfying these criteria. The subscripts on 
h, W' , Z' , and LQ denote assumed masses, in units of GeV. 
(From Ref. 0.) 



seen at the Tevatron. 

Unfortunately, the LEP II data are slowly being lost 
to us, as knowledgeable experimentalists move to other 
projects and retire. As time goes on, the potential bar- 
rier to analyzing these data in a meaningful way grows 
higher and higher, not because computers fail or because 
Fortran becomes obsolete (it won't), but because our col- 
leagues slowly lose their facility in the handling and un- 
derstanding of these data. The best way to ensure that 
the LEP II data are useful in the future is to package up 
— quickly, before it is lost — the knowledge contained 
in the minds of the collaborators on the four LEP exper- 
iments into an algorithm that can perform meaningful 
future analyses of the LEP data. 



A. Interface 

Improvements to the initial design of Quaero are un- 
der development for LEP II and for Tevatron Run II, 
with potential application also to HERA I and II and 
the future LHC. The new Quaero is substantially more 
sophisticated, allowing not just the setting of cross sec- 
tion limits, but in fact the testing of any arbitrary model, 



Quaero 

A General Interface to HEP Data 

Interface Manual 
Development 
Examples 



J Aleph J Delphi JL3 J Opal 
v Pvthia Input : 

v Signal File : 




Backgrounds : r ^ r e+e- r 1+1- r Iph r 4f. r multi-ph r 2ph 

B TEV-II 

JDS F CDF 

Pvthia Input : 

Signal File : 



I 










Backgrounds: 





Maine: I T 
Institution: 
Email: 



Requestor 

Brief description of model: 



J 



FIG. 6: The Quaero interface under development for LEP 
II and Tevatron Run II. Given a particular model, the events 
predicted by the model in e + e~ collisions at « 200 GeV and in 
pp collisions at 1.96 TeV are provided. These events, together 
with all appropriate standard model background processes, 
define the hypothesis to be tested. 



enabling the precision measurement of parameters as well 
as searches for new phenomena. 

The interface defined for LEP II and for Tevatron Run 
II is shown in Fig. The new interface is similar to 
that used at D0, but with all user options removed. The 
querying physicist should not need to specify the final 
state, explicit cuts, or useful variables — Quaero should 
be able to determine these itself. Testing a particular 
model against collider data should be as simple as pro- 
viding a model, in the form of expected events, and an 
email address to which the result should be sent. 
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data 1 189.2 

e+ 45.2 +0.11 0.21 

e- 47.3-0.05 3.56 

b 46.0-0.16 1.71 

b 48.2-0.02 4.90 

unci 3.3 +0.07 3.97 : 



FIG. 7: A sample LEP event in Quaero format. This data 
event has unit weight, and was collected at i/s — 189.2 GeV. 
The event contains four reconstructed objects: a positron, an 
electron, and two fa-tagged jets. Each object is shown with 
its E (energy, in units of GeV), cos# (cosine of the polar 
angle), and (f> (azimuthal angle, in units of radians). The 
object unci represents unclustered energy — energy visible 
in the detector, but not clustered into any of the reconstructed 
objects. 



B. Algorithm 

The back-end interface between Quaero and any ex- 
periment wishing to use Quaero has also been stream- 
lined. An experiment needs to provide four things: 

Data. The events seen in the data, the recon- 
structed objects (e^, r*, 7, /}, j, b) in those 
events, and the four-vectors of those objects. An 
example of a data event at LEP is shown in Fig. [7| 

Backgrounds. Events predicted from the stan- 
dard model, the objects in those events, and the 
four-vectors of those objects. 

Systematic errors. Sources of systematic error, 
and their effect on each of the four-vector quanti- 
ties. 



that from this procedure the likelihood ratio using all 
relevant data is obtained. 

Systematic errors are introduced in a straightforward 
and intuitive way that can be made arbitrarily detailed. 
For each number in each event, the effect of each source 
of systematic error can be assigned. These systematic 
errors are then propagated into the final likelihood by 
numerical integration. No Gaussian assumptions or con- 
venient approximations need be made. The combination 
of results among experiments can be handled similarly. 
The way in which systematic errors are assigned lend 
themselves to an intuitive specification of the correlation 
of systematic errors among various experiments. 

We achieve at the same time significantly more robust 
results due to the fact that the same algorithm and code 
is used for all measurements. Three hundred successful 
Quaero analyses leads to increased confidence in the re- 
sult of the three hundred first; the analogous statement 
does not hold if the analyses are performed by three hun- 
dred uncorrelated graduate students. 

The input to Quaero is therefore simple, being just 
a bunch of events defining the proposed model, and the 
output is a single number. A large number indicates 
that the model is favored relative to the standard model; 
a small number indicates that the model is disfavored 
relative to the standard model. In addition, Quaero is 
currently configured to return plots of the distributions 
of all quantities that contribute significantly to the final 
number returned. 

An International Research Fellowship from the Na- 
tional Science Foundation has assisted initial efforts 
toward the publication of LEP data using Quaero. 
Prototypes are currently under construction within the 
ALEPH and L3 collaborations; policy decisions will fol- 
low the evaluation of these prototypes. 



Detector simulation. A simulation of the detec- 
tor; this can vary from a fast parametrization to a 
full GEANT-based simulation. 

Quaero takes the events provided to it, runs them 
through the detector simulation for each experiment, 
and partitions the resulting events into exclusive final 
states. For each final state Quaero constructs a low- 
dimensional variable space, with dimensionality limited 
by the number of Monte Carlo events at hand, formed 
from the variables in which the distributions from the 
standard model and the proposed model differ most. 
This variable space is binned, and a likelihood ratio 



C{H) = 



p{V\H) 
p(2?|SM) 



(1) 



is computed — the probability of seeing the data given 
the proposed hypothesis H divided by the probability 
of seeing the data given the standard model SM. The 
orthogonality of final states allows likelihoods calculated 
for each final state to be combined by multiplication, so 



IV. SUMMARY 

Quaero is by no means a panacea. It provides no 
help whatsoever in achieving a detailed understanding 
of instrumental features in the data or inadequacies in 
the detector simulation and background estimates. It 
does not allow an exploration of the data for evidence 
of more vaguely-defined hypotheses; for this, a different 
algorithm is required [HI EH 13 13- Quaero's sole 
function is to turn an existing understanding of collider 
data and their backgrounds into statements about the 
underlying physical theory by enabling efficient tests of 
particular hypotheses against those data. 

Alternatively, Quaero can be thought of as a method 
for publishing the results of analyses — or the data them- 
selves — together with the intelligence required to make 
meaningful use of those data. Physicists concerned about 
misuse of their data should realize that the system that 
has been in place now for many decades allows for easy 
misinterpretation of a published histogram or table of 
numbers by colleagues outside the collaboration lacking 
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the detailed, requisite knowledge for making proper use 
of those results. The collaboration in fact has far more 
control of its data if published using Quaero, since in 
this case the collaboration is in complete control of how 
the analysis is done. By completely eliminating all user 
options, the querying physicist has been rendered inca- 
pable of making a mistake. The entire burden of the 
analysis rests with the collaboration, as packaged into 
Quaero. 

The idea for Quaero began with the recognition that 
most high-pT analyses can be automated. The implemen- 
tation of this idea has the potential for curing — or at 
least ameliorating — several painful aspects of our field. 

• By automating analyses, Quaero can serve up cus- 
tom exclusion plots on demand, reducing, if not 
eliminating, the need for these insipid plots at con- 
ferences. 

• Quaero allows the publication of data in their full 
dimensionality, unlimited by the two dimensions of 
a sheet of paper. 

• Quaero obviates the need for "uncorrecting" or 
"unsmearing" procedures by allowing input at the 
level of theory, but making the comparison to data 
at the level of what is seen in the detector. 

• With the high level optimization of the analysis 
completely prescribed by the Quaero code, leav- 
ing no room for human intervention, the threat of 
human bias influencing the results of analyses is 
greatly reduced. 

• Automation of decisions such as the choice of vari- 
ables and analysis technique reduces the time re- 
quired to perform an analysis by orders of magni- 
tude, with a corresponding savings of manpower. 

• The numerical propagation of systematic errors en- 
sures a much more rigorous handling of systematics 
than is typically achieved, and the ability to assign 
the effect that each source of systematic error has 
on every single number in the analysis provides a 
much more intuitive way to think about their as- 
signment in the first place. 

• The accuracy of the results obtained, while never 
guaranteed, is far more certain when using code 



that has performed sensibly in a number of previ- 
ous trials than when using code validated by only 
a handful of individuals in the context of a single 
analysis. 

• Combining results from different final states and 
different experiments requires a high degree of cre- 
ativity when working from the finished results; 
combining results with Quaero, which performs 
the analysis from scratch and hence has access to 
all information needed to make a meaningful com- 
bination, is straightforward. 

A proof of principle of the Quaero idea has been 
achieved, and used to make a subset of D0 Run I data 
publicly available. An improved algorithm with much 
wider application is under development, and being pro- 
totyped at LEP and at the Tevatron. 

Quaero has the potential for putting the LEP II data 
at your fingertips, on the web. Given the tens of thou- 
sands of man years and billions of Swiss francs spent to 
collect these data, and the fact that we are unlikely to 
have e + e~ collisions at energies > 200 GeV for at least 
another decade, the desirability of packaging the LEP 
data in this form should be beyond question. This goal 
is kept alive in the evenings and on weekends by one per- 
son on ALEPH and one person on L3. 
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